[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation by zou3519 · Pull Request #36093 · vllm-project/vllm

zou3519 · 2026-03-05T04:58:22Z

create_concrete_args previously allocated real GPU tensors (via torch.empty) just to carry shape/stride/dtype/device metadata into standalone_compile. Switch to FakeTensors under a FakeTensorMode with a dummy ShapeEnv. (dummy ShapeEnv instead of None is needed to keep AOTAutogradCache happy)

standalone_compile("from_example_inputs") creates its own FakeTensorMode internally, which would conflict with our FakeTensors. Work around this by patching FakeTensorMode in standalone_compile to reuse our mode. Tracked upstream: pytorch/pytorch#176562

gemini-code-assist

Code Review

This pull request introduces an optimization to the torch.compile integration by using FakeTensors instead of real GPU tensors during single-size compilation. This avoids unnecessary GPU memory allocation. The changes are implemented in two parts: create_concrete_args is updated to generate FakeTensors, and InductorStandaloneAdaptor.compile is patched to handle these tensors correctly by reusing the FakeTensorMode, which also serves as a workaround for an upstream PyTorch issue. The implementation is clean, well-commented, and the logic appears sound. I have no major concerns with this change.

zou3519 · 2026-03-05T14:41:56Z

cc @zhxchen17 @eellison

mergify · 2026-03-06T15:30:07Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zou3519.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

BoyuanFeng · 2026-03-09T17:07:37Z

looks good. please resolve merge conflicts.

create_concrete_args previously allocated real GPU tensors (via torch.empty) just to carry shape/stride/dtype/device metadata into standalone_compile. Switch to FakeTensors under a FakeTensorMode with a dummy ShapeEnv. (dummy ShapeEnv instead of None is needed to keep AOTAutogradCache happy) standalone_compile("from_example_inputs") creates its own FakeTensorMode internally, which would conflict with our FakeTensors. Work around this by patching FakeTensorMode in standalone_compile to reuse our mode. Tracked upstream: pytorch/pytorch#176562 Signed-off-by: Richard Zou <zou3519@gmail.com>

…e-size compilation (vllm-project#36093) Signed-off-by: Richard Zou <zou3519@gmail.com>

zou3519 requested review from BoyuanFeng, ProExpertProg and youkaichao as code owners March 5, 2026 04:58

gemini-code-assist Bot reviewed Mar 5, 2026

View reviewed changes

zou3519 force-pushed the compile_size_fake branch from 6f5635a to bcb9e5b Compare March 5, 2026 05:19

mergify Bot added the needs-rebase label Mar 6, 2026

BoyuanFeng approved these changes Mar 9, 2026

View reviewed changes

zou3519 force-pushed the compile_size_fake branch from bcb9e5b to d19ac4a Compare March 9, 2026 17:15

mergify Bot removed the needs-rebase label Mar 9, 2026

zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026

zou3519 enabled auto-merge (squash) March 10, 2026 13:37

zou3519 force-pushed the compile_size_fake branch from 17d977c to 7c46bdb Compare March 11, 2026 13:41

zou3519 merged commit 822e250 into vllm-project:main Mar 11, 2026
53 checks passed

dbari mentioned this pull request Mar 16, 2026

[Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 #37158

Merged

5 tasks

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[torch.compile] Use FakeTensors instead of real GPU tensors for singl…

0ca6e26

…e-size compilation (vllm-project#36093) Signed-off-by: Richard Zou <zou3519@gmail.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[torch.compile] Use FakeTensors instead of real GPU tensors for singl…

9a01463

…e-size compilation (vllm-project#36093) Signed-off-by: Richard Zou <zou3519@gmail.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[torch.compile] Use FakeTensors instead of real GPU tensors for singl…

f50fa5e

…e-size compilation (vllm-project#36093) Signed-off-by: Richard Zou <zou3519@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation#36093

[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation#36093
zou3519 merged 1 commit intovllm-project:mainfrom
zou3519:compile_size_fake

zou3519 commented Mar 5, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

zou3519 commented Mar 5, 2026

Uh oh!

mergify Bot commented Mar 6, 2026

Uh oh!

BoyuanFeng commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zou3519 commented Mar 5, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

zou3519 commented Mar 5, 2026

Uh oh!

mergify Bot commented Mar 6, 2026

Uh oh!

BoyuanFeng commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zou3519 commented Mar 5, 2026 •

edited by github-actions Bot

Loading